Unsupervised Learning: Trade&Ahead

Marks: 60

Context

The stock market has consistently proven to be a good place to invest in and save for the future. There are a lot of compelling reasons to invest in stocks. It can help in fighting inflation, create wealth, and also provides some tax benefits. Good steady returns on investments over a long period of time can also grow a lot more than seems possible. Also, thanks to the power of compound interest, the earlier one starts investing, the larger the corpus one can have for retirement. Overall, investing in stocks can help meet life's financial aspirations.

It is important to maintain a diversified portfolio when investing in stocks in order to maximise earnings under any market condition. Having a diversified portfolio tends to yield higher returns and face lower risk by tempering potential losses when the market is down. It is often easy to get lost in a sea of financial metrics to analyze while determining the worth of a stock, and doing the same for a multitude of stocks to identify the right picks for an individual can be a tedious task. By doing a cluster analysis, one can identify stocks that exhibit similar characteristics and ones which exhibit minimum correlation. This will help investors better analyze stocks across different market segments and help protect against risks that could make the portfolio vulnerable to losses.

Objective

Trade&Ahead is a financial consultancy firm who provide their customers with personalized investment strategies. They have hired you as a Data Scientist and provided you with data comprising stock price and some financial indicators for a few companies listed under the New York Stock Exchange. They have assigned you the tasks of analyzing the data, grouping the stocks based on the attributes provided, and sharing insights about the characteristics of each group.

Data Dictionary

Importing necessary libraries and data

Data Overview

Dataset has 15 columns of data and 340 rows. The first 4 columns are object type with string datatypes. No missing values that are apparent since they all contain 340 rows. The remaining columns are of int64 or float64 datatype, so they are suitable for cluster analysis.

Exploratory Data Analysis (EDA)

Questions:

  1. What does the distribution of stock prices look like?
  2. The stocks of which economic sector have seen the maximum price increase on average?
  3. How are the different variables correlated with each other?
  4. Cash ratio provides a measure of a company's ability to cover its short-term obligations using only cash and cash equivalents. How does the average cash ratio vary across economic sectors?
  5. P/E ratios can help determine the relative value of a company's shares as they signify the amount of money an investor is willing to invest in a single share of a company per dollar of its earnings. How does the P/E ratio vary, on average, across economic sectors?

Data Preprocessing

340 rows of data with 15 columns. The first 4 column are object type data with String type variables I will drop for the purpose of this analysis. The rest of the columns are int64 or float64 so will be suited to a in depth analysis.

No missing values in the dataset.

There is a wide range in the magnitude of data values between the separate columns. The data will be normalized and remove problematic outliers prior to analysis to ensure the best results.

No duplicated values exist in the dataframe.

Questions:

  1. What does the distribution of stock prices look like?

Histogram Summary

Questions:

  1. How are the different variables correlated with each other?

Correlation Comparison

No correlations in the data were higher than 0.60 or lower than -0.4, so no strong correlations in the whole dataset. This is good since it shows that the input variables are independent of each other which will aid in clustering.

Highest correlations were between Net Income and Estimated Shares Outstanding at 0.59 and Net Income and Earnings Per Share at 0.56.

Questions:

  1. The stocks of which economic sector have seen the maximum price increase on average?

Questions:

  1. Cash ratio provides a measure of a company's ability to cover its short-term obligations using only cash and cash equivalents. How does the average cash ratio vary across economic sectors?

The following industries have the highest average cash ratio: Information Technology, Financials, and Health Care are the top three. The rest of the industries have lower averages of cash ratio.

Questions:

  1. P/E ratios can help determine the relative value of a company's shares as they signify the amount of money an investor is willing to invest in a single share of a company per dollar of its earnings. How does the P/E ratio vary, on average, across economic sectors?

All the columns will be floored and capped for outliers to optimize cluster modeling.

Outliers have been removed by assigning either the top or bottom whisker value with for the dataset.

The Scaled data now shows all values within a similar magnitude and the outliers have been removed.

EDA

The outliers have successfully been removed from the dataset and now all values are scaled to be within similar magnitude for all variables. Distributions do not all appear normal, but most are roughly normal type distributions.

K-means Clustering

4 Clusters has a good silhouette score and looked to be a reasonable elbow for the k-means method.

Insights

Hierarchical Clustering

Euclidean distance only below

Highest cophenetic correlation is obtained with euclidean distance and centroid linkage.

Dendogram to describe linkage methods

Observations

9 clusters nicely distributes the GICS Sectors across each cluster.

Dimensionality reduction with PCA

The clusters seem to be well separated from each other with some small exceptions for Clusters 3 and 4. 9 clusters seem to be a good match for dividing the data.

K-means vs Hierarchical Clustering

You compare several things, like:

You can also mention any differences or similarities you obtained in the cluster profiles from both the clustering techniques.

Actionable Insights and Recommendations

-